3 research outputs found
UPDP: A Unified Progressive Depth Pruner for CNN and Vision Transformer
Traditional channel-wise pruning methods by reducing network channels
struggle to effectively prune efficient CNN models with depth-wise
convolutional layers and certain efficient modules, such as popular inverted
residual blocks. Prior depth pruning methods by reducing network depths are not
suitable for pruning some efficient models due to the existence of some
normalization layers. Moreover, finetuning subnet by directly removing
activation layers would corrupt the original model weights, hindering the
pruned model from achieving high performance. To address these issues, we
propose a novel depth pruning method for efficient models. Our approach
proposes a novel block pruning strategy and progressive training method for the
subnet. Additionally, we extend our pruning method to vision transformer
models. Experimental results demonstrate that our method consistently
outperforms existing depth pruning methods across various pruning
configurations. We obtained three pruned ConvNeXtV1 models with our method
applying on ConvNeXtV1, which surpass most SOTA efficient models with
comparable inference performance. Our method also achieves state-of-the-art
pruning performance on the vision transformer model
MLPerf Inference Benchmark
Machine-learning (ML) hardware and software system demand is burgeoning.
Driven by ML applications, the number of different ML inference systems has
exploded. Over 100 organizations are building ML inference chips, and the
systems that incorporate existing models span at least three orders of
magnitude in power consumption and five orders of magnitude in performance;
they range from embedded devices to data-center solutions. Fueling the hardware
are a dozen or more software frameworks and libraries. The myriad combinations
of ML hardware and ML software make assessing ML-system performance in an
architecture-neutral, representative, and reproducible manner challenging.
There is a clear need for industry-wide standard ML benchmarking and evaluation
criteria. MLPerf Inference answers that call. In this paper, we present our
benchmarking method for evaluating ML inference systems. Driven by more than 30
organizations as well as more than 200 ML engineers and practitioners, MLPerf
prescribes a set of rules and best practices to ensure comparability across
systems with wildly differing architectures. The first call for submissions
garnered more than 600 reproducible inference-performance measurements from 14
organizations, representing over 30 systems that showcase a wide range of
capabilities. The submissions attest to the benchmark's flexibility and
adaptability.Comment: ISCA 202